Rich t kid/introduce dict benchmarks by Rich-T-kid · Pull Request #21860 · apache/datafusion

Rich-T-kid · 2026-04-26T18:20:51Z

Which issue does this PR close?

This PR provides the benchmarks mentioned in #7647 & #9017

Works towards closing Materialize Dictionaries in Group Keys #7647.

Rationale for this change

Currently the benchmark suite doesn't have any dictionary-encoded tables with aggregations performed on them. This makes it difficult to prove performance improvements, for example, a separate PR I'm working on (#21765) is hard to validate because the existing benchmarks don't exercise this path. This PR attempts to close that gap.

What changes are included in this PR?

Adds a new dict benchmark to dfbench that measures group-by performance on dictionary-encoded columns across varying cardinality (5/10/25%), null rates (0/15%), and value types (Utf8 and List), covering both single and multi-column group-by scenarios.

Are these changes tested?

--

Are there any user-facing changes?

no

kumarUjjawal

There's something wrong with github so I am not able to post comment on the line number but basically at line 372:

Is this check needed? `schema` is created from the same `query`, and `make_record_batch` always adds `dict_col2` when `query.col2` is `Some`. So this condition looks unreachable?

Rich-T-kid · 2026-05-06T15:58:25Z

Yea I agree. Removed it

Rich-T-kid · 2026-05-06T16:33:19Z

@kumarUjjawal linting error broke the CI, just pushed up a fix

kumarUjjawal

Looks good!

kumarUjjawal · 2026-05-07T05:49:15Z

Thank you @Rich-T-kid

## Which issue does this PR close? benchmarks for apache#21765. Also related to apache#21860 The goal is to merge this PR and then rebase the branch on apache#21765 to contain these benchmarks, so that they can be run and compared to the original.  ## Rationale for this change Originally this was included in apache#21765 but that PR is already very large. I decided to move it to its own separate PR  ## What changes are included in this PR? Adds benchmarks for the dictionary encoding array path of **new_group_values()**.  ## Are these changes tested? n/a  ## Are there any user-facing changes? no   --------- Co-authored-by: Kumar Ujjawal <ujjawalpathak6@gmail.com>

Rich-T-kid force-pushed the rich-t-kid/Introduce-dict-benchmarks branch from 9c07966 to c465861 Compare April 26, 2026 18:30

Rich-T-kid added 2 commits April 26, 2026 14:37

introduce dictionary test

dff60d4

Revamp v1

7347a93

Rich-T-kid force-pushed the rich-t-kid/Introduce-dict-benchmarks branch from c465861 to 7347a93 Compare April 26, 2026 18:38

Rich-T-kid mentioned this pull request Apr 26, 2026

Optimize Dictionary groupings #21765

Open

Merge branch 'main' into rich-t-kid/Introduce-dict-benchmarks

bb648fa

Rich-T-kid mentioned this pull request May 3, 2026

Add benchmarks for dictionary path of new_group_values #22004

Merged

kumarUjjawal reviewed May 5, 2026

View reviewed changes

Comment thread benchmarks/src/dict.rs

Comment thread benchmarks/src/dict.rs Outdated

Comment thread benchmarks/bench.sh

revised with PR comments

96961e2

Rich-T-kid requested a review from kumarUjjawal May 5, 2026 21:36

kumarUjjawal reviewed May 6, 2026

View reviewed changes

Rich-T-kid requested a review from kumarUjjawal May 6, 2026 15:58

remove un-needed check

5befe3f

Rich-T-kid force-pushed the rich-t-kid/Introduce-dict-benchmarks branch from f9e5ee5 to 5befe3f Compare May 6, 2026 16:31

kumarUjjawal approved these changes May 6, 2026

View reviewed changes

Merge branch 'main' into rich-t-kid/Introduce-dict-benchmarks

7ca4a70

kumarUjjawal added this pull request to the merge queue May 7, 2026

Merged via the queue into apache:main with commit 6b27d2d May 7, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rich t kid/introduce dict benchmarks#21860

Rich t kid/introduce dict benchmarks#21860
kumarUjjawal merged 6 commits intoapache:mainfrom
Rich-T-kid:rich-t-kid/Introduce-dict-benchmarks

Rich-T-kid commented Apr 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kumarUjjawal left a comment

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

kumarUjjawal left a comment

Uh oh!

kumarUjjawal commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rich-T-kid commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kumarUjjawal left a comment

Choose a reason for hiding this comment

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

kumarUjjawal left a comment

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rich-T-kid commented Apr 26, 2026 •

edited

Loading